Multi-criterial coding sequence prediction. Combination of GeneMark with two novel, coding-character specific quantities
نویسندگان
چکیده
This work applies two recently formulated quantities, strongly correlated with the coding character of a sequence, as an additional "module" on GeneMark, in a three-criterial method. The difference in the statistical approaches implicated by the methods combined here, is expected to contribute to an efficient assignment of functionality to unannotated genomic sequences. The developed combined algorithm is used to fractionalize a collection of GeneMark-predicted exons into sub-collections of different expectation to be coding. A further modification of the algorithm allows for the assignment of an improved estimation of the probability to be coding, to GeneMark-predicted exons. This is on the basis of a suitable training set of GeneMark-predicted exons of known functionality.
منابع مشابه
Applications of GeneMark in Multispecies Environments
This paper is supposed to bridge the gap between practical experience in using GeneMark for a rapidly widening repertoire of genomes, and the available publications that determine and compare the gene prediction accuracy of the GeneMark method for different genomes. Here we focus on the genome-specific variability of prediction error rates and their sources. DNA sequence inhomogeneity is presen...
متن کاملGene Recognition in Cyanobacterium Genomic Sequence Data Using the Hidden Markov Model
We have developed a hidden Markov model (HMM) to detect the protein coding regions within one megabase contiguous sequence data, registered in a database called GenBank in eight entries, of the genome of cyanobacterium, Synechocystis sp. strain PCC6803. Detection of the coding regions in the database entry was performed by using HMM whose parameters were determined by taking the statistics from...
متن کاملPrediction Rate of Coding Regions is Enhanced upto 99.15 % by Joint Use of GeneMark-RC and GeneHacker in Case of a Cyanobacterium
The advancement in large-scale sequencing has accelerated the production of long contiguous nucleotide sequence data. The whole genomic sequence data is currently available for several prokaryotic organisms. The rst step in the analysis of genomic sequence data is to assign coding regions, which is absolutely necessary for a comparative study of one organism with the others and to elucidate com...
متن کاملProbabilistic methods of identifying genes in prokaryotic genomes: Connections to the HMM theory
In this paper, we review developments in probabilistic methods of gene recognition in prokaryotic genomes with the emphasis on connections to the general theory of hidden Markov models (HMM). We show that the Bayesian method implemented in GeneMark, a frequently used gene-finding tool, can be augmented and reintroduced as a rigorous forward-backward (FB) algorithm for local posterior decoding d...
متن کاملA Novel Application of GeneMark-RC to the Analysis of Prokaryotic Genomes and Human cDNAs: Sequence Data with Statistical Deviations Are Rich in Important Biological Information
Assignment of coding regions is the first step in genome sequence analysis. Although the complete genome sequences of sixteen organisms have already been published, the strategies adopted for coding region assignment are different from one organism to another, and consequently the data are not readily suited for direct comparative analysis. Evidently, a more unified method for defining coding r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computers in biology and medicine
دوره 35 7 شماره
صفحات -
تاریخ انتشار 2005